Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add embeddings for a few economics indicators #4971

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

beets
Copy link
Contributor

@beets beets commented Feb 20, 2025

Add embeddings to base for a few economic indicators.
Attached reports with diffs below.

SV Index Differ.pdf
NL Eval Playground - Data Commons.pdf

Copy link
Contributor

@chejennifer chejennifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, just wondering if all the descriptions for each statistical variable was necessary or if we can clean it up and just have one good description per

@@ -3387,6 +3387,7 @@ sdg/ER_PTD_FRHWTR,Average proportion of Freshwater Key Biodiversity Areas covere
sdg/ER_PTD_MTN,Average proportion of Mountain Key Biodiversity Areas covered by protected areas
sdg/ER_PTD_TERR,Average proportion of Terrestrial Key Biodiversity Areas covered by protected areas
sdg/ER_RSK_LST,Red List Index
sdg/FP_CPI_TOTL_ZG,"Annual inflation, consumer prices;Annual inflation rate as measured by the consumer price index;Percentage change in the cost to the average consumer of acquiring a basket of goods and services that may be fixed or changed at specified intervals, such as yearly;Annual inflation rate consumer price index"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first description "Annual inflation, consumer prices", the second description "annual inflation rate as measured by the consumer price index" and the last description "annual inflation rate consumer price index" are all quite similar, are they all needed? Last year when we did an overhaul on the embeddings, the goal was to have one good distinct description per variable, maybe 2 if necessary

@@ -3602,7 +3603,11 @@ worldBank/4_1_SHARE_RE_IN_ELECTRICITY,Renewable electricity share of total elect
worldBank/EG_ELC_ACCS_RU_ZS,percentage of rural population with access to electricity
worldBank/EG_ELC_ACCS_UR_ZS,percentage of urban population with access to electricity
worldBank/EG_ELC_ACCS_ZS,percentage of population with access to electricity
worldBank/FR_INR_DPST,"Deposit interest rate;Deposit interest rate is the rate paid by commercial or similar banks for demand, time, or savings deposits"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, not sure if the description of what a deposit interest rate is is necessary? the more sentences we add, the harder it is to have variables be in distinct embedding spaces

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants